Abstract
Background Geriatric assessment (GA) is now recommended by the American Society of Clinical Oncology for all older adults with cancer receiving systemic therapy (Dale, JCO, 2023). Although in-person GA with tools such as the Fried frailty phenotype (Journals of Gerontology, 2001) is the gold-standard, many patients do not receive these due to time and cost constraints. Screening electronic health records for evidence of frailty with large language models (LLMs) is a computational approach that may overcome these constraints, with in-person follow-up GA for patients who screen positive.
Methods From February 2015 to January 2025, we assessed 843 patients aged ≥ 73 presenting at Dana-Farber Cancer Institute for an initial consultation for MDS/leukemia, myeloma, or lymphoma. A trained research assistant conducted an assessment via the Fried model, which uses five criteria to define a frailty syndrome (self-reported exhaustion, weight loss, low physical activity, slow gait speed, and weakness measured by grip strength). Patients were classified as robust if they had no deficits, prefrail if they had one or two deficits, and frail if they had 3 or more; for this analysis, we created a cohort of patients with an equal mix of prefrail/frail/robust and then developed an LLM zero shot prompt based on the Fried phenotype. The prompt was applied to the hematology consult note on the day the in-person GA was performed, as well as all available notes seven days pre and post for the entire patient cohort using GPT-4o-2024-05-13 within a HIPAA-compliant digital infrastructure. The prompt required output to include source text to support identification of each of the five frailty criteria.
Results Among 582 patients in the cohort who were classified as prefrail or frail with in-person screening, LLM identified 438 (75.3%) as prefrail or frail. LLM identified several phrases as associated with having Fried criteria, for example “Her main complaint is fatigue. Sometimes she will sleep up to 12 hours in the evening” for self-reported exhaustion and “Gait: slow” for slow gait speed. In contrast, among 261 patients who were classified in-person as robust, LLM incorrectly considered 166 (63.6%) prefrail or frail. Of these patients, 146 screened positive for low energy (example documentation identified by LLM: “she complained of being tired”), 81 for unintentional weight loss (“30lbs unintentional weight loss since November”), 47 for low activity (“he is still able to work in his garden but his wife has been mowing the lawn which he normally does with a push lawn mower”), 16 for slow walking speed (“significantly limited ambulatory abilities”), and 11 for low grip strength (“She has pain and perceived weakness with fine motor skills: buttoning clothing, putting on earrings, unscrewing bottles”). False positives also often occurred due to the LLM misinterpreting negated symptoms (“Denies unintentional weight loss”) as being present.
Conclusions These data suggest that LLMs can leverage clinical documentation in hematology notes to identify most patients with blood cancers who are prefrail or frail based on gold-standard assessment but have more difficulty identifying those who are robust. LLMs may also help find frailty “needles in a haystack,” information that may otherwise be missed during tedious chart review of a large volume of notes when in-person GA assessment is not feasible. Further refinement of LLM prompts—including additional prompt engineering or a 5-shot approach—will likely improve accuracy and limit LLM false positives. Moreover, more structured and precise clinical documentation with respect to frailty syndromes may also enhance LLM accuracy.
This feature is available to Subscribers Only
Sign In or Create an Account Close Modal